Primary exercises

Apply the following to survey data:

  1. Select personal information {name, age, gender, height} into a new tibble survey_personal_info.

  2. Select personal information as previous exercise into a new tibble survey_personal_info but with variable names initials in uppercase, e.g. Name, Age etc.

  3. Reorder the variables in survey dataset as such that name,age and gender appear as first, second and the third column followed by the remaining variables.

  4. Deselect variables that relate to hand and/or arm (e.g. span1, span2, hand, etc.). See also description survey data.

  5. Select the top 20 names along with gender.

  6. Reproduce the following tibbles (note that variables are renamed and reshuffled):

    6.1 First 5 observations.

    # A tibble: 5 × 13
      SPAN1 SPAN2 name   gender hand  fold  pulse clap  exerc…¹ smokes height m.i     age
      <dbl> <dbl> <chr>  <chr>  <chr> <chr> <dbl> <chr> <chr>   <chr>   <dbl> <chr> <dbl>
    1  18.5  18   Alyson female right right    92 left  some    never    173  metr…  18.2
    2  19.5  20.5 Todd   male   left  right   104 left  none    regul    178. impe…  17.6
    3  18    13.3 Gerald male   right left     87 neit… none    occas     NA  <NA>   16.9
    4  18.8  18.9 Robert male   right right    NA neit… none    never    160  metr…  20.3
    5  20    20   Dustin male   right neit…    35 right some    never    165  metr…  23.7
    # … with abbreviated variable name ¹​exercise

    6.1 Last 3 observations.

    # A tibble: 3 × 13
      Hand  Fold  Clap  name   gender span1 span2 pulse exerc…¹ smokes height m.i     age
      <chr> <chr> <chr> <chr>  <chr>  <dbl> <dbl> <dbl> <chr>   <chr>   <dbl> <chr> <dbl>
    1 right right right Tracey female  17.5  16.5    NA some    never    170  metr…  18.6
    2 right right right Keith  male    21    21.5    90 some    never    183  metr…  17.2
    3 right right right Celina female  17.6  17.3    85 freq    never    168. metr…  17.8
    # … with abbreviated variable name ¹​exercise

Extra exercises

  1. Rename the m.i variable to system.

  2. Select name along with all categorical variables into a new tibble survey_cats.

  3. Create a new tibble survey_nums with name and all numerical variables.

  4. For this exercise you’ll need an additional helper function where explained
    here.

    4.1 Reproduce the result from the previous exercise (3) without dictating all numerical variable names. Hint: you’ll also need is.numeric function (see ?is.numeric for help).

    4.2 Select all non-numerical variables.

Selection by pattern matching

In data sets with large number of variables, finding variables will become tedious. Several helper functions are available to speed up the variable name search.

starts_with(), ends_with() and contains()

The functions help to find fixed patterns in variable names:

The helper functions can be used with logical operators {!,|,&} which will be explained later. You have already encountered one in the lecture on Useful R functions, !, the negation operator. In short it complements the results. For example, above we could select variables which started with character ‘a’ with select(pulse, starts_with("a")) which resulted into a tibble with the two variables age and alcohol. Using ! in front of the helper function in the expression will produce the complement of the previous result, namely all variables that do not start with a:

Note that age and alcohol do not occur in the result.

There are several other helper functions which fall beyond the scope of this lecture, visit here for more details.

  1. Select variables, from survey data, by pattern matching.

    5.1 Select variables that end with ‘e’.

    5.2 Select variables that start with ‘s’.

    5.3 Select hand span variables using a helper function.



Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC